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■^j- ■ Abstract 

Specialized intelligent systems can be found everywhere: finger print, hand- 
' writing, speech, and face recognition, spam filtering, chess and other game 

programs, robots, et al. This decade the first presumably complete mathe- 
matical theory of artificial intelligence based on universal induction-prediction- 
decision-action has been proposed. This information-theoretic approach solid- 
I ■ ifies the foundations of inductive inference and artificial intelligence. Getting 

the foundations right usually marks a significant progress and maturing of 
t^J- | a field. The theory provides a gold standard and guidance for researchers 

working on intelligent algorithms. The roots of universal induction have been 
. laid exactly half-a-century ago and the roots of universal intelligence exactly 

one decade ago. So it is timely to take stock of what has been achieved and 
qs^ ' what remains to be done. Since there are already good recent surveys, I de- 

scribe the state-of-the-art only in passing and refer the reader to the literature. 
J> \ This article concentrates on the open problems in universal induction and its 

extension to universal intelligence. 
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"The mathematician is by now accustomed to intractable equations, and 
even to unsolved problems, in many parts of his discipline. However, 
it is still a matter of some fascination to realize that there are parts 
of mathematics where the very construction of a precise mathematical 
statement of a verbal problem is itself a problem of major difficulty. " 

- Richard Bellman, Adaptive Control Processes (1961) p. 194 



1 Introduction 

What is a good model of the weather changes? Are there useful models of the world 
economy? What is the true regularity behind the number sequence 1,4,9,16,...? 
What is the correct relationship between mass, force, and acceleration of a physical 
object? Is there a causal relation between interest rates and inflation? Are models 
of the stock market purely descriptive or do they have any predictive power? 

Induction. The questions above look like a set of unrelated inquires. What they 
have in common is that they seem to be amenable to scientific investigation. They 
all ask about a model for or relation between observations. The purpose seems to be 
to explain or understand the data. Generalizing from data to general rules is called 
inductive inference, a core problem in philosophy [Hum39, Pop34 lHow03] and a key 
task of science [Lev741 IEar931 IWal05j . 

But why do or should we care about modeling the world? Because this is what 
science is about [Sal06j? As indicated above, models should be good, useful, true, 
correct, causal, predictive, or descriptive |FH06j . Digging deeper, we see that models 
are mostly used for prediction in related but new situations, especially for predicting 
future events [Wik08] . 

Predictions. Consider the apparently only slight variation of the questions above: 
What is the correct answer in an IQ test asking to continue the sequence 1,4,9,16,...? 
Given historic stock-charts, can one predict the quotes of tomorrow? Or questions 
like: Assuming the sun rose every day for 5000 years, how likely is doomsday (that 
the sun will not rise) tomorrow? What is my risk of dying from cancer next year? 

These questions are instances of the important problem of time-series forecasting, 
also called sequence prediction |BD02j |CBL06| . While inductive inference is about 
finding models or hypotheses that explain the data (whatever explain actually shall 
mean), prediction is concerned about forecasting the future. Finding models is 
interesting and useful, since they usually help us to (partially) answer such predictive 
questions [Gei93l ICha03 b]. While the usefulness of predictions is clearer to the 
layman than the purpose of the scientific inquiry for models, one may again ask, 
why we do or should we care about making predictions? 

Decisions. Consider the following questions: Shall I take my umbrella or wear 
sunglasses today? Shall I invest my assets in stocks or bonds? Shall I skip work 
today because it might be my last day on earth? Shall I irradiate or remove the 
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tumor of my patient? These questions ask for decisions that have some (minor to 
drastic) consequences. We usually want to make "good" decisions, where the quality 
is measured in terms of some reward (money, life expectancy) or loss |Fer67l IDeG70l 
IJef83] . In order to compute this reward as a function of our decision, we need to 
predict the environment: whether there will be rain or sunshine today, whether the 
market will go up or down, whether doomsday is tomorrow, or which type of cancer 
the patient has. Often forecasts are uncertain [Par 9 5] . but this is still better than 
no prediction. Once we arrived at a (hopefully good) decision, what do we do next? 

Actions. The obvious thing is to execute the decision, i.e. to perform some action 
consistent with the decision arrived at. The action may not influence the environ- 
ment, like taking umbrella versus sunglasses does not influence the future weather 
(ignoring the butterfly effect) or small stock trades. These settings are called passive 
[Hu t03dj . and the action part is of marginal importance and usually not discussed. 
On the other hand, a patient might die from a wrong treatment, or a chess player 
loses a figure and possibly the whole game by making one mistake. These settings 
are called (re)active |Hut07cj . and their analysis is immensely more involved than 
the passive case [Ber06] . 

And now? There are many theories and algorithms and whole research fields and 
communities dealing with some aspects of induction, prediction, decision, or action. 
Some of them will be detailed below. Finding solutions for every particular (new) 
problem is possible and useful for many specific applications. Trouble is that this 
approach is cumbersome and prone to disagreement or contradiction |Kem03j . Some 
researchers feel that this is the nature of their discipline and one can do little about 
it [KLW06J. But in science (in particular math, physics, and computer science) 
previously separate approaches are constantly being unified towards more and more 
powerful theories and algorithms [GSWOOl IGreOOj . There is at least one field, where 
we must put everything (induction+prediction+decision+action) together in a com- 
pletely formal (preferably elegant) way, namely Artificial Intelligence |RN03] . Such 
a general and formal theory of AI has been invented about a decade ago [Hut 00] . 

Contents. In Section I will give a brief introduction into this universal theory 
of AI. It is based on an unexpected unification of algorithmic information theory 
and sequential decision theory. The corresponding AIXI agent is the first sound, 
complete, general, rational agent in any relevant but unknown environment with 
reinforcement feedback |Hut05t [QC06] . It is likely the best possible such agent in a 
sense to be explained below. 

Section [3] describes the historic origin of the AIXI model. One root is 
Solomonoff's theory [Sol60] of universal induction, which is closely connected to 
algorithmic complexity. The other root is Bellman's adaptive control theory [Bel57j 
for optimal sequential decision making. Both theories are now half-a-century old. 
From an algorithmic information theory perspective, AIXI generalizes optimal pas- 
sive universal induction to the case of active agents. From a decision-theoretic 
perspective, AIXI is a universal Bayes-optimal learning algorithm. 
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Sections \^\^ constitute the core of this article describing the open problems 
around universal induction & intelligence. Most of them are taken from the book 
|Hut05j and paper |Hut07b] . I focus on questions whose solution has a realistic 
chance of advancing the field. I avoid technical open problems whose global signifi- 
cance is questionable. 

Solomonoff 's half-a-century-old theory of universal induction is already well de- 
veloped. Naturally, most remaining open problems are either philosophically or 
technically deep. 

Its generalization to Universal Artificial Intelligence seems to be quite intricate. 
While the AIXI model itself is very elegant, its analysis is much more cumbersome. 
Although AIXI has been shown to be optimal in some senses, a convincing notion 
of optimality is still lacking. Convergence results also exist, but are much weaker 
than in the passive case. 

Its construction makes it plausible that AIXI is the optimal rational general 
learning agent, but unlike the induction case, victory cannot be claimed yet. It 
would be natural, hence, to compare AIXI to alternatives, if there were any. Since 
there are no competitors yet, one could try to create some. Finally, AIXI is only 
"essentially" unique, which gives rise to some more open questions. 

Given that AI is about designing intelligent systems, a serious attempt should 
be made to formally define intelligence in the first place. Astonishingly there have 
been not too many attempts. There is one definition that is closely related to AIXI, 
but its properties have yet to be explored. 

The final Section briefly discusses the flavor, feasibility, difficulty, and inter- 
estingness of the raised questions, and takes a step back and briefly compares the 
information-theoretic approach to AI discussed in this article to others. 

2 Universal Artificial Intelligence 

Artificial Intelligence. The science of artificial intelligence (AI) may be defined as 
the construction of intelligent systems (artificial agents) and their analysis [RN03J. 
A natural definition of a system is anything that has an input and an output stream, 
or equivalently an agent that acts and observes. Intelligence is more complicated. 
It can have many faces like creativity, solving problems, pattern recognition, classi- 
fication, learning, induction, deduction, building analogies, optimization, surviving 
in an environment, language processing, planning, and knowledge acquisition and 
processing. Informally, AI is concerned with developing agents that perform well 
in a large range of environments |LH07cj . A formal definition incorporating every 
aspect of intelligence, however, seems difficult. In order to solve this problem we 
need to solve the induction, prediction, decision, and action problem, which seems 
like a daunting (some even claim impossible) task: Intelligent actions are based on 
informed decisions. Attaining good decisions requires predictions which are typically 
based on models of the environments. Models are constructed or learned from past 
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observations via induction. Fortunately, based on the deep philosophical insights and 
powerful mathematical developments listed in Section [31 these problems have been 
overcome, at least in theory. 

Universal Artificial Intelligence (UAI). Most, if not all, known facets of in- 
telligence can be formulated as goal driven or, more precisely, as maximizing some 
reward or utility function. It is, therefore, sufficient to study goal-driven AI; e.g. the 
(biological) goal of animals and humans is to survive and spread. The goal of AI 
systems should be to be useful to humans. The problem is that, except for special 
cases, we know neither the utility function nor the environment in which the agent 
will operate in advance. What do we need (from a mathematical point of view) to 
construct a universal optimal learning agent interacting with an arbitrary unknown 
environment? The theory, coined AIXI, developed in this decade and explained 
in |Hut05] says: All you need is Occam [Fra02] . Epicurus [Asm84j . Turing [Tur36] . 
Bayes |Bay63| , Solomonoff |Sol64j . Kolmogorov [Kol65j . and Bellman [Bel57j : Se- 
quential decision theory |Ber06j (Bellman's equation) formally solves the problem of 
rational agents in uncertain worlds if the true environmental probability distribution 
is known. If the environment is unknown, Bayesi&ns [Bcr93j replace the true distri- 
bution by a weighted mixture of distributions from some (hypothesis) class. Using 
the large class of all (semi) measures that are (semi) computable on a Turing machine 
bears in mind Epicurus, who teaches not to discard any (consistent) hypothesis. In 
order not to ignore Occam, who would select the simplest hypothesis, Solomonoff' 
defined a universal prior that assigns high/low prior weight to simple/complex en- 
vironments, where Kolmogorov quantifies complexity |Hut07al ILV08] . All other 
concepts and phenomena attributed to intelligence are emergent. All together, this 
solves all conceptual problems [Hut05j, and "only" computational problems remain. 

Kolmogorov complexity. Kolmogorov [Kol65j defined the complexity of a string 
iGA"* over some finite alphabet X as the length i of a shortest description pG {0,1}* 
on a universal Turing machine U: 

Kolmogorov complexity: K(x) := min{£(p) : U(p) = x} 

A string is simple if it can be described by a short program, like "the string of 
one million ones", and is complex if there is no such short description, like for a 
random string whose shortest description is specifying it bit-by-bit. For non-string 
objects o one defines K(o) :—K((o)), where (o) G X* is some standard code for o. 
Kolmogorov complexity [Kol65l IHut08] is a key concept in (algorithmic) information 
theory |LV08] . An important property of K is that it is nearly independent of the 
choice of U, i.e. different choices of U change K "only" by an additive constant 
(see Section Bhl . Furthermore it leads to shorter codes than any other effective 
code. K shares many properties with Shannon's entropy (information measure) S 
[Mac03l l"CT06| . but K is superior to S in many respects. Foremost, K measures the 
information of individual outcomes, while S can only measure expected information 
of random variables. To be brief, K is an excellent universal complexity measure, 
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suitable for quantifying Occam's razor. The major drawback of K as complexity 
measure is its incomputability. So in practical applications it has always to be 
approximated, e.g. by Lempel-Ziv compression |LZ76l ICV05] . or by CTW |WST 97j 
compression, or by using two-part codes like in MDL and MML, or by others. 

Solomonoff induction. Solomonoff [Sol64] defined (earlier) the closely related 
universal a priori probability M(x) as the probability that the output of a universal 
(monotone) Turing machine U starts with x when provided with fair coin flips on 
the input tape [H LV07] . Formally, 

Solomonoff prior: M(x) := £ 2~ £(p) , 

p:U (p)=x* 

where the sum is over all (possibly non-halting) so-called minimal programs p which 
output a string starting with x. Since the sum is dominated by short programs, we 
have M(x) ^a2~ K ^ (formally —logM(x)—K(x)+0(\ogl(x))), i.e. simple/complex 
strings are assigned a high/low a-priori probability. A different representation is 
as follows |ZL70j : Let M. = {v} be a countable class of probability measures v 
(environments) on infinite sequences /i G M. be the true sampling distribu- 

tion, i.e. fi(x) is the true probability that an infinite sequences starts with x, and 
£m( x ) '■=Hv£M w v l '( x ) be the w-weighted average called Bayesian mixture distri- 
bution. One can show that M(x) =£,m u { x ), where Aijj includes all computable 
probability measures and w u — 2~ K ^\ More precisely, Aiu ■— {^1,1^2, ■■■} consists of 
an effective enumeration of all so-called lower semi-computable semi-measures z/j, 
and K(vi):=K(i):=K({i)) [LV08] . 

M can be used as a universal sequence predictor, which outperforms in a strong 
sense all other predictors. Consider the classical online sequence prediction task: 
Given x <t = Xi : t-i :=x±...xt-i, predict x t ; then observe the true x t ; repeat. 
For xi :00 generated by the unknown "true" distribution /i e M.U1 one can show 
[Sol78] that the universal predictor M(x t \x <t ) : = M(xi-t)/M(x <t ) rapidly converges 
to the true probability /i(x(|x <i ) = /i(xi :t )//i(x <i ) of the next observation x t <EX given 
history x <t . That is, M serves as an excellent predictor of any sequence sampled 
from any computable probability distribution. 

The AIXI model. It is possible to write down the AIXI model explicitly in one 
line |Hut07cj . although one should not expect to be able to grasp the full meaning 
and power from this compact representation. 

AIXI is an agent that interacts with an environment in cycles k — l,2,...,m. In 
cycle k, AIXI takes action a k (e.g. a limb movement) based on past perceptions 
Oir 1 ..Ofc_irfc_ 1 as defined below. Thereafter, the environment provides a (regular) 
observation Ok (e.g. a camera image) to AIXI and a real- valued reward r^. The 
reward can be very scarce, e.g. just +1 (-1) for winning (losing) a chess game, and 
at all other times. Then the next cycle k + 1 starts. Given the above, AIXI is 
defined by: 

AIXI: a k := argmax V . . . max V [r k + ■ ■ ■ + r m ] V 2~ e(q) 

°k^k OmTm q ; U (q,a\..a Tn )=oir\..Omrm 
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The expression shows that AIXI tries to maximize its total future reward rk+...+r m . 
If the environment is modeled by a deterministic program q, then the future percep- 
tions ...Okrk-.o m r m — U(q,ai..a m ) can be computed, where U is a universal (monotone 
Turing) machine executing q given a\..a m . Since q is unknown, AIXI has to maximize 
its expected reward, i.e. average rfc + ...+r m over all possible perceptions created by 
all possible environments q. The simpler an environment, the higher is its a-priori 
contribution 2~^ q \ where simplicity is measured by the length I of program q. The 
inner sum X^ : ...2 - ^ generalizes Solomonoff's a-priori distribution M by including 
actions. Since noisy environments are just mixtures of deterministic environments, 
they are automatically included. The sums in the formula constitute the averaging 
process. Averaging and maximization have to be performed in chronological order, 
hence the interleaving of max and £ (similarly to minimax for games). The value 
V of AIXI (or any other agent) is its expected reward sum. 

One can fix any finite action and perception space, any reasonable U, and any 
large finite lifetime m. This completely and uniquely defines AIXI's actions a^, 
which are limit-computable via the expression above (all quantities are known). 

That's it! Ok, not really. It takes a whole book and more to explain why AIXI 
likely is the most intelligent general-purpose agent and incorporates all aspects of 
rational intelligence. In practice, AIXI needs to be approximated. AIXI can also be 
regarded as the gold standard which other practical general purpose AI programs 
should aim at (analogue to minimax approximations/heuristics). 

The role of AIXI for AI. The AIXI model can be regarded as the first complete 
theory of AI. Most if not all AI problems can easily be formulated within this theory, 
which reduces the conceptual problems to pure computational questions. Solving 
the conceptual part of a problem often causes a quantum leap forward in a field. 
Two analogies may help: QED is a complete theory of all chemical processes. ZFC 
solved the conceptual problems of sets (e.g. Russell's paradox). 

From an algorithmic information theory (AIT) perspective, the AIXI model gen- 
eralizes optimal passive universal induction to the case of active agents. From a 
decision-theoretic perspective, AIXI is a suggestion of a new (implicit) "learning" 
algorithm, which may overcome all (except computational) problems of previous 
reinforcement learning algorithms. If the optimality theorems of universal induction 
and decision theory generalize to the unified AIXI model, we would have, for the 
first time, a universal (parameterless) model of an optimal rational agent in any 
computable but unknown environment with reinforcement feedback. 

Although deeply rooted in algorithm theory, AIT mainly neglects computation 
time and so does AIXI. It is important to note that this does not make the AI 
problem trivial. Playing chess optimally or solving NP-complete problems become 
trivial, but driving a car or surviving in nature do not. This is because it is a 
challenge itself to well-define the latter problems, not to mention presenting an 
algorithm. In other words: The AI problem has not yet been well defined (cf. the 
quote after the abstract). One may view AIXI as a suggestion of such a mathematical 
definition. 
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Although Kolmogorov complexity is incomputable in general, Solomonoff 's the- 
ory triggered an entire field of research on computable approximations. This led 
to numerous practical applications [LV07| . If the AIXI model should lead to a 
universal "active" decision maker with properties analogous to those of universal 
"passive" predictors, then we could expect a similar stimulation of research on 
resource-bounded, practically feasible variants. First attempt have been made to 
test the power and limitations of AIXI and downscaled versions like AlXItl and AI£ 
|PH06b|, lPan 08] . as well as related models derived from basic concepts of algorithmic 
information theory. 

So far, some remarkable and surprising results have already been obtained (see 
Section [3]). A 2, 12, 60, 300 page introduction to the AIXI model can be found in 
[HutOlel IHutOldl IHut07cl lHut05j . respectively, and a gentle introduction to UAI in 
Leg08| . 

3 History and State-of-the-Art 

The theory of UAI and AIXI build on the theories of universal induction, universal 
prediction, universal decision making, and universal agents. From a historical and 
research-field perspective, the AIXI model is based on two otherwise unconnected 
fundamental theories: 

(1) The major basis is Algorithmic information theory |LV08] . initiated by [Sol64l 
IKol65t ICha66j . which builds the foundation of complexity and randomness of 
individual objects. It can be used to quantify Occam's razor principle (use the 
simplest theory consistent with the data). This in turn allowed Solomonoff to 
come up with a universal theory of induction |Sol64l ISol78j . 

(2) The other basis is the theory of optimal sequential decisions, initiated by Von 
Neumann [NM44] and Bellman [Bel57j . This theory builds the basis of modern 
reinforcement learning |SB98j . 

This section outlines the history and state-of-the-art of the theories and research 
fields involved in the AIXI model. 

Algorithmic information theory (AIT). In the 1960's [Kol65j ISol64t ICha66] in- 
troduced a new machine independent complexity measure for arbitrary computable 
data. The Kolmogorov complexity K(x) is defined as the length of the shortest 
program on a universal Turing machine that computes x. It is closely related to 
Solomonoff 's universal a-priori probability M{x) ~ 2~ K ^ (see above), Martin-L6f 
randomness of individual sequences |ML66j . time-bounded complexity [Lev84j . uni- 
versal optimal search [Lev73j . the speed prior |Sch02bj . the halting probability Q 
[Cha87j . strong mathematical undecidability |Cha03aj . generalized probability and 
complexity [Sch02aj . algorithmic statistics [GTV011 IW021 IVit02j . and others. 
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Despite its incomputability, AIT found many applications in philosophy, prac- 
tice, and science: The minimum message/description length (MML/MDL) princi- 
ples [ WB68 [ IRis78l IRis89] can be regarded as a practical approximation of Kol- 
mogorov complexity. MML&MDL are widely used in machine learning applications 
|QR89| IGL891 IMJ931 IPed89l IWal05l IGruO?] . The latest, most direct and impressive 



applications are via the universal similarity metric [CV05JEY06J. Schmidhuber pro- 
duced another range of impressive applications to neural networks [Sch97al ISZW97] , 
in search problems [Sch04j, and even in the fine arts |Sch97bJ. By carefully ap- 
proximating Kolmogorov complexity, AIT sometimes lead to results unmatched by 
other approaches. Besides these practical applications, AIT is used to simplify 
proofs via the incompressibility method, improves Shannon information, is used in 
reversible computing, physical entropy and Maxwell daemon issues, artificial in- 
telligence, and the asymptotically fastest algorithm for all well-defined problems 
[Cal02l IHut05l IHut02al IHut07al HV08j . 

Universal Solomonoff induction. How and in which sense induction is possible 
at all has been subject to long philosophical controversies |Hum39t IStoOlj IHut05] . 
Highlights are Epicurus' principle of multiple explanations |Asm84j . Occam's ra- 
zor (simplicity) principle |Fra02], and Bayes' rule for conditional probabilities 
Ba y63~l lEar93j . Solomonoff [Sol64] elegantly unified these aspects with the concept 



of universal Turing machines [Tur 36j to one formal theory of inductive inference 
based on a universal probability distribution M, which is closely related to Kol- 
mogorov complexity K (M(x) m2~ K ^). The theory allows for optimally predicting 
sequences without knowing their true generating distribution \x |Sol78j , and presum- 
ably solves the induction problem. The theory remained for more than 20 years at 
this stage, till the work on AIXI started, which resulted in a beautiful elaboration 
and extension of Solomonoff's theory. 

Meanwhile, the (non) existence of universal priors for several generalized com- 
putability concepts [Sch02a, H ut03bl lHut06b| has been classified, rapid convergence 
of M to the unknown true environmental distribution \x [Hut 01 a] and tight error 
[Hu t 01c] and loss bounds for arbitrary bounded loss functions and finite alphabet 
[HutOlbl IHut03aj have been proven, and (Pareto) optimality of M |Hut03dl lHut03b] 
has been shown, exemplified on games of chance and compared to predictions 
with expert advice [Hut03dl IHut04bj . The bounds have been further improved 
by introducting a version of Kolmogorov complexity that is monotone in the con- 
dition [CH05, CHS07J. Similar but necessarily weaker non-asymptotic bounds 
for universal deterministic/one-part MDL |Hut03ej [Hut06dJ and discrete two-part 
MDL [PH04al IPH05al IPH04bl IPH06aj have also been proven. Quite unexpectedly 
[Hu t03cj M does not converge on all Martin- Lof random sequences [HM04J, but 
there is a sophisticated remedy |HM07j . 

All together this shows that Solomonoff's induction scheme represents a universal 
(formal, but incomputable) solution to all passive prediction problems. The most 
recent studies |Hut06cj suggest that this theory could solve the induction problem 
at whole, or at least constitute a significant progress in this fundamental problem 
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[HutQ7bj . 

Sequential decision theory. Sequential decision theory provides a framework for 
finding optimal reward-maximizing strategies in reactive environments (e.g. chess 
playing as opposed to weather forecasting), assuming the environmental probability 
distribution /x is known. The Bellman equations |Bel57] are at the heart of sequential 
decision theory |NM44l IMic66l IRN03j . The book [Ber06j summarizes open problems 
and progress in infinite horizon problems. Sequential decision theory can deal with 
actions and observations depending on arbitrary past events. This general setup 
has been called AI/i model in |Hut05t IHut07c] . Optimality of AI/i is obvious by 
construction. This model reduces in special cases to a range of known models. 

Reinforcement learning. If the true environmental probability distribution \i 
or the reward function are unknown, they need to be learned [SB98J. This dra- 
matically complicates the problem due to the explorations-sexploitation dilemma 
|BF85l IDuf02l IHut05l ISL08] . In order to attack this intrinsically difficult prob- 
lem, control theorists typically confine themselves to linear systems with quadratic 
loss function, relevant in the control of (simple) machines, but irrelevant for AI. 
There are notable exceptions to this confinement, e.g. the book [KV86J on stochas- 
tic adaptive control and |ATA89al I ATA89b] . and an increasing number of more 
recent work. Reinforcement learning (RL) (sometimes associated with temporal 
difference learning or neural nets) is the instantiation of stochastic adaptive con- 
trol theory [KV86J in the machine learning community. Current research on RL is 
vast; the most important conferences are ICML, COLT, ECML, ALT, and NIPS; 
the most important journals are JMLR and MLJ. Some highlights and surveys are 
|Sam59l IBSA831 ISutM IWatM IWD921 IMA931 ITesMl IWS981 IKK991 IWSS991 lBau99l 
iKPOOl ISLJ+031 IGKPV031 IRH08al ISDL071 ISTM IRPPCd08l IHut09bl IHut09a] and 



|KLM961 IKLC981 ISB981 IBDH991 IBer06] respectively. RL has been applied to a va- 
riety of real-world problems, occasionally with stunning success: Backgammon and 
Checkers [SB981 Clip. 11], helicopter control [NCD + 04] . and others. Nevertheless, 
existing learning algorithms are very limited (typically to Markov domains), and 
non-optimal — from the very outset they are approximate or asymptotic only. In- 
deed, AIXI is currently the only general and rigorous mathematical formulation of 
the addressed problems. 

The universal algorithmic agent AIXI. Reinforcement learning algorithms 
|KLM96j IBT96[ ISB98] are usually used in the case of unknown /i. They can succeed 
if the state space is either small or has effectively been made small by generaliza- 
tion techniques. The algorithms work only in restricted, (e.g. Markov) domains, 
have problems with optimally trading off exploration versus exploitation, have non- 
optimal learning rate, are prone to diverge, or are otherwise ad hoc. 

The formal solution proposed in [HutOldj, IHut05] is to generalize the universal 
probability M to include actions as conditions and replace \i by M in the AI/i model, 
resulting in the AIXI model, which is presumably universally optimal. It is quite 
non-trivial what can be expected from a universally optimal agent and to properly 



10 



interpret or define universal, optimal, etc |Hut07cj . It is known that M converges 
to fi also in case of multi-step lookahead as occurs in the AIXI model |Hut04a] . 
and that a variant of AIXI is asymptotically self-optimizing and Pareto optimal 
|Hut02bj lEHOgj . 

The book |Hut05j gives a comprehensive introduction and discussion of previous 
achievements on or related to AIXI, including a critical review, more open problems, 
comparison to other approaches to AI, and philosophical issues. 

Important environmental classes. In practice, one is often interested in specific 
classes of problems rather than the fully universal setting; for example we might be 
interested in evaluating the performance of an algorithm designed solely for function 
maximization. A taxonomy of abstract environmental classes from the mathemati- 
cal perspective of interacting chronological systems [LH04b| Leg08| has been estab- 
lished. The relationships between Bandit problems, MDP problems, ergodic MDPs, 
higher order MDPs, sequence prediction problems, function optimization problems, 
strategic games, classification, and many others are formally defined and explored 
therein. The work also suggests new abstract environmental classes that could be 
useful from an analytic perspective. In [Hut 05] . each problem class is formulated 
in its natural way for known fi, and then a formulation within the AI/z model is 
constructed and their equivalence is shown. Then, the consequences of replacing \x 
by M are considered, and in which sense the problems are formally solved by AIXI. 

Computational aspects. The major drawback of AIXI is that it is incomputable, 
or more precisely, only asymptotically computable, which makes a direct implemen- 
tation impossible. To overcome this problem, the AIXI model can be scaled down 
to a model coined AIXPZ, which is still superior to any other time t and length I 
bounded agent [HutOldl IHut05j . The computation time of AIXKZ is of the order 
t-2 l . A way of overcoming the large multiplicative constant 2 l is possible at the 
expense of an (unfortunately even larger) additive constant. The constructed algo- 
rithm builds upon Levin search [Lev 731 Gag07 . The algorithm is capable of solving 



all well-defined problems p as quickly as the fastest algorithm computing a solution 
top, save for a factor of 1+e and lower-order additive terms [Hut02aJ. The solution 
requires an implementation of first-order logic, the definition of a universal Turing 
machine within it and a proof theory system. The algorithm as it is, is only of the- 
oretical interest, but there are more practical variations [Sch04, SchQ5j. A different, 
more limited but more practical scaled-down version (coined AI£) has been imple- 
mented and applied successfully to 2x2 matrix games like the notoriously difficult 
repeated prisoner problem and generalized variants thereof [PH06bJ . 



4 Open Problems in Universal Induction 

The induction problem is a fundamental problem in philosophy |Hum39t lEar93j 
and science |Jay03] . Solomonoff's model is a promising universal solution of the 
induction problem. In [Hut07bj . an attempt has been made to collect the most 
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important fundamental philosophical and statistical problems, regarded as open, 
and to present arguments and proofs that Solomonoff's theory overcomes them. 
Despite the force of the arguments, they are likely not yet sufficient to convince 
the (scientific) world that the induction problem is solved. The discussion needs 
to be rolled out much further, say, at least one generally accessible article per one 
allegedly open problem. Indeed, this endeavor might even discover some catch in 
Solomonoff's theory. Some problems identified and outlined in |Hut07b] worth to 
investigate in more detail are: 

a) The zero prior problem. The problem is how to confirm universal hypothe- 
ses like if : = "all balls in some urn (or all ravens) are black". A natural model 
is to assume that balls (or ravens) are drawn randomly from an infinite pop- 
ulation with fraction 9 of black balls (or ravens) and to assume some prior 
density over 9e [0;1] (a uniform density gives the Bayes-Laplace model). Now 
we draw n objects and observe that they are all black. The problem is that 
the posterior proability P[H |blackx...black n ] = 0, since the prior probability 
P[H] = P[9 = 1] =0. Maher's [Mah04j approach does not solve the problem 
[Hut07bj . 

b) The black raven paradox by Carl Gustav Hempel goes as follows [ResOll 
Ch.11.4]: Observing Slack .Ravens confirms the hypothesis H that all ravens 
are black. In general, (i) hypothesis is confirmed by /^-instances with 
property B. Formally substituting R^^B and B^^R leads to (ii) hypoth- 
esis -iB—t-iR is confirmed by -i5-instances with property ->R. But (Hi) since 
R^B and -iB—t-iR are logically equivalent, R^B must also be confirmed 
by -il?-instance with property ->R. Hence by (i), observing Black Ravens con- 
firms Hypothesis H, so by (Hi), observing White Socks also confirms that all 
Ravens are Black, since White Socks are non-Ravens which are non-Black. 
But this conclusion is absurd. Again, neither Maher's nor any other approach 
solves this problem. 

c) The Grue problem |Goo83j . Consider the following two hypotheses: 
H1: = U A\\ emeralds are green", and H2: = U A\\ emeralds found till year 2020 
are green, thereafter all emeralds are blue" . Both hypotheses are equally well 
supported by empirical evidence. Occam's razor seems to favor the more plau- 
sible hypothesis HI, but by using new predicates grue:= "green till y2020 and 
blue thereafter" and 6/een: = "blue till y2020 and green thereafter", H2 gets 
simpler than HI. 

d) Reparametrization invariance |KW96j . The question is how to extend the 
symmetry principle from finite hypothesis classes (all hypotheses are equally 
likely) to infinite hypothesis classes. For "compact" classes, Jeffrey's prior 
[Jcf46j is a solution, but for non-compact spaces like IV or JR., classical statis- 
tical principles lead to improper distributions, which are often not acceptable. 
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e) Old-evidence/updating problem and ad-hoc hypotheses |Gly80| . How shall 

a Bayesian treat the case when some evidence E=x (e.g. Mercury's perihe- 
lion advance) is known well-before the correct hypothesis/theory/model H=[i 
(Einstein's general relativity theory) is found? How shall H be added to 
the Bayesian machinery a posteriori? What is the prior of HI Should it 
be the belief in if in a hypothetical counterfactual world in which E is not 
known? Can old evidence E confirm HI After all, H could simply be con- 
structed/biased/fitted towards "explaining" E. Strictly speaking, a Bayesian 
needs to choose the hypothesis/model class before seeing the data, which sel- 
dom reflects scientific practice |Ear 93j . 

f) Other issues/problems. Comparison to Carnap's confirmation theory 
|Car52j and Laplace rule |Lapl2| , allowing for continuous model classes, how 
to incorporate prior knowledge |Pre02t IGol06j , and others. 

Solomonoff 's theory has already been intensively studied in the predictive setting 
[Sol781 IHutOlcl IHut03dl IHut03al ICHS07| mostly confirming its power, with the 



occasional unexpected exception |HM07j . Important open questions are: 

g) Prediction of selected bits. Consider a very simple and special case of 
problem 0fl a binary sequence that coincides at even times with the preceding 
(odd) bit, but is otherwise incomputable. Every child will quickly realize that 
the even bits coincide with the preceding odd bit, and after a while perfectly 
predict the even bits, given the past bits. The incomputability of the sequence 
is no hindrance. It is unknown whether Solomonoff works or fails in this 
situation. I expect that a solution of this special case will lead to general 
useful insights and advance this theory (cf. problem IHj) . 

h) Identification of "natural" Turing machines. In order to pin down the 
additive/multiplicative constants that plague most results in AIT, it would 
be highly desirable to identify a class of "natural" UTMs/USMs which have 
a variety of favorable properties. A more moderate approach may be to con- 
sider classes Ci of universal Turing machines (UTM) or universal semimeasures 
(USM) satisfying certain properties V% and showing that the intersection fljCj 
is not empty. Indeed, very occasionally results in AIT only hold for particular 
(subclasses of) UTMs |MP02j . A grander vision is to find the single "best" 
UTM or USM [Mul06] (a remarkable approach). 

i) Martin-L6f convergence. Quite unexpectedly, a loophole in the proof of 
Martin-L6f (M.L.) convergence of M to fi in the literature has been found 
[Hut03cj . In [HM04] it has been shown that this loophole cannot be fixed, 
since M.L.-convergence actually can fail. The construction of non-universal 
(semi)measures D and W that M.L. converge to /z |HM07j partially rescued 
the situation. The major problem left open is the convergence rate for W— >fi. 
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The current bound for D^^-jj is double exponentially worse than for M^—^-ji. 
It is also unknown whether convergence in ratio holds. Finally, there could still 
exist universal semimeasures M (dominating all enumerable semimeasures) 
for which M.L. -convergence holds. In case they exist, they probably have 
particularly interesting additional structure and properties. 



Generalized mixtures and convergence concepts. Another interesting 
and potentially fruitful approach to the above convergence problem is to con- 
sider other classes of semimeasures M. [ Sch 02b | ISch02at [Hut03b| , define mix- 
tures £ over Ai, and (possibly) generalized randomness concepts by using 
this £ to define a generalized notion of randomness. Using this approach, in 
[Hut06bj it has been shown that convergence holds for a subclass of Bernoulli 
distributions if the class is dense, but fails if the class is gappy, showing that 
a denseness characterization of M. could be promising in general. See also 
[RH07llRH08b] . 



k) Lower convergence bounds and defect of M. One can show that 
M(x t \x <t ) >2~ K ( t \ i.e. the probability of making a wrong prediction x t con- 
verges to zero slower than any computable summable function. This shows 
that, although M converges rapidly to fi in a cumulative sense, occasionally, 
namely for simply describable t, the prediction quality is poor. An easy way 
to show the lower bound is to exploit the semimeasure defect of M. Do similar 
lower bounds hold for a proper (Solomonoff) normalized measure M norm l I 
conjecture the answer is yes, i.e. the lower bound is not a semimeasure artifact, 
but "real". 



1) Using AIXI for prediction. Since AIXI is a unification of sequential deci- 
sion theory with the idea of universal probability one may think that the AIXI 
model for a sequence prediction problem exactly reduces to Solomonoff 's uni- 
versal sequence prediction scheme. Unfortunately this is not the case. For 
one reason, M is only a probability distribution on the inputs but not on the 
outputs. This is also one of the origins of the difficulty of proving general 
value bounds for AIXI. The questions is whether, nevertheless, AIXI predicts 
sequences as well as Solomonoff's scheme. A first weak bound in a very re- 
stricted setting has been proven in |Hut05t Sec. 6. 2], showing that progress in 
this question is possible. 



The most important open, but unfortunately likely also the hardest, problem is the 
formal identification of natural universal (Turing) machines (jh|). A proper solution 
would eliminate one of the two most important critiques of the whole field of AIT. 
Item (0} is an important question for universal AI. 
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5 Open Problems regarding Optimality of AIXI 



AIXI has been shown to be Pareto-optimal and a variant of AIXI to be self- 
optimizing [Hut02b]. These are important results supporting the claim that AIXI 
is universally optimal. More results can be found in [Hu t05j. Unlike the induction 
case, the results are not strong enough to alley all doubts. Indeed, the major prob- 
lem is not to prove optimality but to come up with a sufficiently strong but still 
satisfiable optimality notion in the reinforcement learning case. The following items 
list four potential approaches towards a solution: 

a) What is meant by universal optimality? A "learner" (like AIXI) may 
converge to the optimal informed decision maker (like AI/x) in several senses. 
Possibly relevant concepts from statistics are, consistency, self-tuningness, self- 
optimizingness, efficiency, unbiasedness, asymptotically or finite convergence 
[KV86j . Pareto- optimality, and some more defined in [Hut05j. Some concepts 
are stronger than necessary, others are weaker than desirable but suitable to 
start with. It is necessary to investigate in more breadth which properties the 
AIXI model satisfies. 

b) Limited environmental classes. The problem of defining and proving 
general value bounds becomes more feasible by considering, in a first step, 
restricted concept classes. One could analyze AIXI for known classes (like 
Markov or factorizable environments) and especially for the new classes (for- 
getful, relevant, asymptotically learnable, farsighted, uniform, and (pseudo- 
)passive) defined in |Hut05| . 

c) Generaliztion of AIXI to general Bayes mixtures. Alternatively one can 
generalize AIXI to AI£, where £(•) = J2veM w v l> (') * s a general Bayes-mixture 
of distributions v in some class Ai and prior w v . If Ai is the multi-set of 
all enumerable semi-measures, then AI£ coincides with AIXI. If Ai is the 
(multi)set of passive semi-computable environments, then AIXI reduces to 
Solomonoff's optimal predictor [Hut03dj . The key is not to prove absolute 
results for specific problem classes, but to prove relative results of the form "if 
there exists a policy with certain desirable properties, then AI£ also possesses 
these desirable properties". If there are tasks which cannot be solved by any 
policy, AI£ should not be blamed for failing. 

d) Intelligence Aspects of AIXI. Intelligence can have many faces. As argued 
in 111 in ()■"). it is plausible that AIXI possesses all or at least most properties an 
intelligent rational agent should posses. Some of the following properties could 
and should be investigated mathematically: creativity, problem solving, pattern 
recognition, classification, learning, induction, deduction, building analogies, 
optimization, surviving in an environment, language processing, planning. 

Sources of inspiration can be previously proven loss bounds for Solomonoff sequence 
prediction generalized to unbounded horizon, optimality results from the adaptive 
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control literature, and the asymptotic self-optimizingness results for the related AI£ 
model. Value bounds for AIXI are expected to be, in a sense, weaker than the loss 
bounds for Solomonoff induction because the problem class covered by AIXI is much 
larger than the class of sequence prediction problems. 

In the same sense as Gittins' solution to the bandit problem and Laplace' rule 
for Bernoulli sequences, AIXI may simply be regarded as (Bayes-)optimal by con- 
struction. Even when accepting this "easy way out", the above questions remain 
significant: Theorems relating AIXI to AI/z would no longer be regarded as opti- 
mality proofs of AIXI, but just as how much harder it becomes to operate when fi 
is unknown, i.e. progress on the items above is simply reinterpreted. 

A weaker goal than to prove optimality of AIXI is to ask for reasonable conver- 
gence properties: 

f) Posterior convergence for unbounded horizon. Convergence of M to [l 
holds somewhat surprisingly even for unbounded horizon, which is good news 
for AIXI. Unfortunately convergence can be slow, but I expect that conver- 
gence is "reasonably" fast for "slowly" growing horizon, which is important in 
AIXI. It would be useful to quantify and prove such a result. 

g) Reinforcement learning. Although there is no explicit learning algorithm 
built into the AIXI model, AIXI is a reinforcement learning system capable 
of receiving and exploiting rewards. The system learns by eliminating Turing 
machines q in the definition of M once they become inconsistent with the 
progressing history. This is similar to Gold-style learning [Gol67] . For Markov 
environments (but not for partially observable environments) there are efficient 
general reinforcement learning algorithms, like TD(X) and Q learning. One 
could compare the performance (learning speed and quality) of AI£ to e.g. 
TD(X) and Q learning, extending [PH06bJ. 

h) Posterization. Many properties of Kolmogorov complexity, Solomonoff's 
prior, and reinforcement learning algorithms remain valid after "posteriza- 
tion". With posterization I mean replacing the total value Vi m , the weights 
w u , the complexity K{v), the environment ^(or 1:m |a 1:m ), etc. by their "pos- 
teriors" V km , w u (aor <k ), K(u\aor <k ), v{or kim \or <k a\.. m ) y etc, where k is the 
current cycle and m the lifespan of AIXI. Strangely enough for w u chosen as 
2~ K< y u ) it is not true that w u (aor <k ) r^2~ K ^ aor<k \ If this property were true, 
weak bounds as the one proven in [Hut 05 \ Sec. 6. 2] (which is too weak to be of 
practical importance) could be boosted to practical bounds of order 1. Hence, 
it is highly import to rescue the posterization property in some way. It may 
be valid when grouping together essentially equal distributions v. 

i) Relevant and non-computable environments /i. Assume that the ob- 
servations of AIXI contain irrelevant information, like noise. Irrelevance can 
formally be defined as being statistically independent of future observations 
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and rewards, i.e. neither affecting rewards, nor containing information about 
future observations. It is easy to see that Solomonoff prediction does not de- 
cline under such noise if it is sampled from a computable distribution. This 
likely transfers to AIXI. More interesting is the case, where the irrelevant in- 
put is complex. If it is easily separable from the useful input it should not 
affect AIXI. One the other hand, even in prediction this problem is non-trivial, 
see problem |^} How robustly does AIXI deal with complex but irrelevant in- 
puts? A model that explicitly deals with this situation has been developed in 
[HutOQbl IHutOQa] . 

j) Grain of truth problem [KL93J. Assume AIXI is used in a multi-agent 
setup |Wei00j interacting with other agents. For simplicity I only discuss the 
case of a single other agent in a competitive setup, i.e. a two-person zero-sum 
game situation. We can entangle agents A and B by letting A observe B J s 
actions and vice versa. The rewards are provided externally by the rules of the 
game. The situation where A is AIXI and B is a perfect minimax player was 
analyzed in |Hut05l Sec. 6. 3]. In multi-agent systems one is mostly interested 
in a symmetric setup, i.e. B is also an AIXI. Whereas both AIXIs may be able 
to learn the game and improve their strategies (to optimal minimax or more 
generally Nash equilibrium), this setup violates one of the basic assumptions. 
Since AIXI is incomputable, AIXI (B) does not constitute a computable envi- 
ronment for AIXI(v4). More generally, starting with any class of environments 
M, then /i=AI^ seems not to belong to class M for most (all?) choices 
of M. Various results can no longer be applied, since n^M when coupling 
two AI£s. Many questions arise: Are there interesting environmental classes 
for which AI£m G M or AI£tl M G Ml Do AlXl(A/B) converge to optimal 
minimax players? Do AIXIs perform well in general multi-agent setups? 

From the optimality questions above, the first one ([aj) is the most important, least 
defined, and likely hardest one: In which sense can a rational agent in general 
and AIXI in particular be optimal? The multi-agent setting adds another layer of 
difficulty: The grain of truth problem (|j]) is in my opinion the most important fun- 
damental problem in game theory and multi-agent systems. Its satisfactory solution 
should be worth a Nobel prize or Turing award. 

6 Open Problems regarding Uniqueness of AIXI 

As a unification of two optimal theories, it is plausible that AIXI is optimal in the 
"union" of their domains, which has been affirmed but not finally settled by the 
positive results derived so far. In the absence of a definite answer, one should be 
open to alternative models, but no convincing competitor exists to date. Most of 
the following items describe ideas which, if worked out, might result in alternative 
models: 
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a) Action with expert advice. Expected performance bounds for predictions 
based on Solomonoff's prior exist. Inspired by Solomonoff induction, a dual, 
currently very popular approach, is "prediction with expert advice" (PEA) 
|LW89| rVov92l ICBL06] . Whereas PEA performs well in any environment, but 
only relative to a given set of experts, Solomonoff's predictor competes with 
any other predictor, but only in expectation for environments with computable 
distribution. It seems philosophically less compromising to make assumptions 
on prediction strategies than on the environment, however weak. PEA has 
been generalized to active learning [P H05bl ICBL06j , but the full reinforcement 
learning case is still open [PH06bj. If successful, it could result in a model 
dual to AIXI, but I expect the answer to be negative, which on the positive 
side would show the distinguishedness of AIXI. Other ad-hoc approaches like 
[RH06, RH08aj are also unlikely to be competitive. 

b) Actions as random variables. There may be more than one way for the 

choice of the generalized M in the AIXI model. For instance, instead of defin- 
ing M as in [Hut05j one could treat the agent's actions a also as universally 
distributed random variables and then conditionalize M on a. 

c) Structure of AIXI. The algebraic properties and the structure of AIXI has 
barely been investigated. It is known that the value of AI/i is a linear function 
in fj, and the value of AIXI is a convex function in /i, but this is neither very 
deep nor very specific to AIXI. It should be possible to extract all essentials 
from AIXI which finally should lead to an axiomatic characterization of AIXI. 
The benefit is as in any axiomatic approach: It would clearly exhibit the as- 
sumptions, separate the essentials from technicalities, simplify understanding 
and, most importantly, guide in finding proofs. 

d) Parameter dependence. The AIXI model depends on a few parameters: 
the choice of observation and action spaces and A, the horizon m, and the 
universal machine U. So strictly speaking, AIXI is only (essentially) unique, 
if it is (essentially) independent of the parameters. I expect this to be true, 
but it has not been proven yet. The [/-dependence has been discussed in 
problem l4hl Countably infinite and A would provide a rich enough interface 
for all problems, but even binary and A are sufficient by sequentializing 
complex observations and actions. For special classes one could choose m-^oo 
|Ber06j ; unfortunately, the universal environment M does not belong to any 
of these special classes. See |Hut05l IHut06al ILH07c| for some preliminary 
considerations. 

7 Open Problems in Defining Intelligence 

A fundamental and long standing difficultly in the field of artificial intelligence is that 
(generic) intelligence itself is not well defined. It is an anomaly that nowadays most 
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AI researchers avoid discussing intelligence, which is caused by several factors: It is a 
difficult old subject, it is politically charged, it is not necessary for narrow AI which 
focusses on specific applications, AI research is done mainly by computer scientists 
who mainly care about algorithms rather than philosophical foundations, and the 
popular belief that general intelligence is principally unamenable to a mathematical 
definition. These reasons explain but only partially justify the low effort in trying 
to define intelligence. 

Assume we had a definition, ideally a formal, objective, non-anthropocentric, 
and direct method of measuring intelligence, or at least a very general intelligence- 
like performance measure that could serve as an adequate substitute. This would 
bring the higher goals of the field into tight focus and allow us to objectively com- 
pare different approaches and judge the overall progress. Indeed, formalizing and 
rigorously defining a previously vague concept usually constitutes a quantum leap 
forward in the field: Cf. set theory, logical reasoning, infinitesimal calculus, energy, 
temperature, etc. Of course there is (some) work on defining [LH07aj and testing 
[LH07b] intelligence (see |LH07c] for a comprehensive list of references) : 

The famous Turing test [Tur50[ ISCA00[ ILoe90j involves human interaction, so 
is unfortunately informal and anthropocentric, others are large "messy" collections 
of existing intelligence tests [ BS03| IAABL02] ("shotgun" approaches), which are 
subjective and lack a clear theoretical grounding, and are potentially too narrow. 

There are some more elegant solutions based on classical [Hor02j and algorithmic 
[Cha82j information theory ("C-Test" |HOMC98l iHOOOal IHOOOb] ) . the latter closely 
related to Solomonoff's |Sol64j "perfect" inductive inference model. The simple 
program in [S D03] reached good IQ scores on some of the more mathematical tests. 

One limitation of the C-Test however is that it only deals with compression and 
(passive) sequence prediction, while humans or machines face reactive environments 
where they are able to change the state of the environment through their actions. 
AIXI generalizes Solomonoff to reactive environments, which suggested an extremely 
general, objective, fundamental, and formal performance measure [LH06t Leg08|. 
This so-called Intelligence Order Relation (IOR) |Hut05] even attracted the popular 
scientific press [G R051 IFie05j . but the theory surrounding it has not yet been ade- 
quately explored. Here I only describe three non-technical open problems in defining 
intelligence. 



a) General and specific performance measures. Currently it is only par- 
tially understood how the IOR theoretically compares to the myriad of other 
tests of intelligence such as conventional IQ tests or even other performance 
tests proposed by AI other researchers. Another open question is whether 
the IOR might in some sense be too general. One may narrow the IOR to 
specific classes of problems [LH04b] and compare how the resulting IOR mea- 
sures compare to standard performance measures for each problem class. This 
could shed light on aspects of the IOR and possibly also establish connec- 
tions between seemingly unrelated performance metrics for different classes of 
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problems. 



b) Practical performance measures. A more practically orientated line of 
investigation would be to produce a resource bounded version of the IOR like 
the one in [Hut05l Sec. 7], or perhaps some of its special cases. This would 
allow one to define a practically implementable performance test, similar to 
the way in which the C-Test has been derived from incomputable definitions 
of compression using Kt complexity jHOOOa] . As there are many subtle kinds 
of resource bounded complexity [LV08], the advantages and disadvantages of 
each in this context would need to be carefully examined. Another possibility 
is the recent Speed Prior |Sch02b] or variants of this approach. 

c) Experimental evaluation. Once a computable version of the IOR had been 
defined, one could write a computer program that implements it. One could 
then experimentally explore its characteristics in a range of different prob- 
lem spaces. For example, it might be possible to find correlations with IQ test 
scores when applied to humans, like has been done with the C-Test |HOMC98] . 
Another possibility would be to consider more limited domains like classifica- 
tion problems or sequence prediction problems and to see whether the relative 
performance of algorithms according to the IOR agrees with standard perfor- 
mance measures and real world performance. 

A comprehensive collection, discussion and comparison of verbal and formal intelli- 
gence tests, definitions, and measures can be found in [LH07c| . 

8 Conclusions 

The flavor of the open questions. While most of the key questions about uni- 
versal sequence prediction have been solved, many key questions about universal AI 
remain open to date. The questions in Sections HUT] are centered around the AIT 
approach to induction and AI, but many require interdisciplinary working. A more 
detailed account with technical details can be found in the book |Hut05] and paper 
[Hut07bj . Most questions are amenable to a rigorous mathematical treatment, in- 
cluding the more philosophically or vaguely sounding ones. Progress on the latter 
can achieved in the usual way by cycling through (i) craft or improve mathemat- 
ical definitions that resemble the intuitive concepts to be studied (e.g. "natural", 
"generalization", "optimal"), (ii) formulate or adapt a mathematical conjecture re- 
sembling the informal question, (Hi) (dis)prove the conjecture. Some questions are 
about approximating, implementing, and testing various ideas and concepts. Tech- 
nically, many questions are on (the interface between) and exploit techniques used in 
(algorithmic) information theory, machine learning, Bayesian statistics, (adaptive) 
control theory, and reinforcement learning. 



20 



Feasibility, difficulty, and interestingness of the open questions. I con- 
centrated on questions whose answers probably help to develop the foundations of 
universal induction and UAL Some problems are very hard, and their satisfactory 
solution worth a Nobel prize or Turing award, e.g. problem^} I included those ques- 
tions that looked promising and interesting at the time of writing this article. In the 
following I try to estimate their relative feasibility, difficulty, and interestingness: 



• Problems roughly sorted from most important or interesting to least: 




• Problems roughly sorted from most to least time consuming: 




• Problems roughly sorted from hard to easy: 




These rankings hopefully do not mislead but give the interested reader some guid- 
ance where (not) to start. The final paragraphs of this article are devoted to the 
role UAI plays in the grand goal of AI. 

Other approaches to AI. There are many fields that try to understand the phe- 
nomenon of intelligence and whose insights help in creating intelligent systems: Cog- 
nitive psychology and behaviorism [SMM07J, philosophy of mind |Cha02l ISea05] , neu- 
roscience [HB04] . linguistics [Hau01tlCho06| . anthropology [Par 07] , machine learning 
[SB981 [Bli06] . logic |Tur84l Oo87j . computer science jTTJOll IRN03] . biological evo- 
lution |Kar07j . and others. In computer science, most AI research is bottom-up; 
extending and improving existing or developing new algorithms and increasing their 
range of applicability; an interplay between experimentation on toy problems and 
theory, with occasional real-world applications. The agent perspective of AI |RN03j 
brings some order and unification in the large variety of problems the fields wants 
to address, but it is only a framework rather than a complete theory. In the absence 
of a perfect (stochastic) model of the environment, machine learning techniques are 
needed and employed. Apart from AIXI, there is no general theory for learning 
agents. This resulted in an ever increasing number of limited models and algorithms 
in the past. 

The information-theoretic approach to AI. Solomonoff induction and AIXI 
are mathematical top-down approaches. The price for this generality is that the 
full models are computationally intractable, and investigations have to be mostly 
theoretical at this stage. From a different perspective, UAI strictly separates the 
conceptual and algorithmic AI questions. Two analogies may help: Von Neumann's 
optimal minimax strategy |NM44| is a conceptual solution of zero-sum games, but 
is infeasible for most interesting zero-sum games. Nevertheless most algorithms 
are based on approximations of this ideal. In physics, the quest for a "theory 
of everything" (TOE) lead to extremely successful unified theories, despite their 
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computational intractability [GSWOOj IGreOO] . The role of UAI in AI should be 
understood as analogous to the role of minimax in zero-sum games or of the TOE 
in physics. 

Epilogue. As we have seen, algorithmic information theory offers answers to the 
following two key scientific questions: (1) The problem of induction, which is what 
science itself is mostly about: Induction « finding regularities in data ~ under- 
standing the world ~ science. (2) Understanding intelligence, the key property that 
distinguishes humans from animals and inanimate things. 

This modern mathematical approach to both questions (1) and (2) is quite dif- 
ferent to the more traditional philosophical, logic-based, engineering, psychological, 
or neurological approaches. Among the few other mathematical approaches, none 
captures rational intelligence as completely as the AIXI model does. Still, a lot of 
questions remain open. Raising and discussing them was the primary focus of this 
article. 

Imagine a complete practical solution of the AI problem (by the next generation 
or so), i.e. systems that surpass human intelligence. This would transform society 
more than the industrial revolution two centuries ago, the computer last century, 
and the internet this century. Although individually, some questions I raised seem 
quite technical and narrow, they derive their significance from their role in a truly 
outstanding scientific endeavor. As with most innovations, the social benefit of 
course depends on its benevolent use. 
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